89 research outputs found

    Generalized Error Exponents For Small Sample Universal Hypothesis Testing

    Full text link
    The small sample universal hypothesis testing problem is investigated in this paper, in which the number of samples nn is smaller than the number of possible outcomes mm. The goal of this work is to find an appropriate criterion to analyze statistical tests in this setting. A suitable model for analysis is the high-dimensional model in which both nn and mm increase to infinity, and n=o(m)n=o(m). A new performance criterion based on large deviations analysis is proposed and it generalizes the classical error exponent applicable for large sample problems (in which m=O(n)m=O(n)). This generalized error exponent criterion provides insights that are not available from asymptotic consistency or central limit theorem analysis. The following results are established for the uniform null distribution: (i) The best achievable probability of error PeP_e decays as Pe=exp{(n2/m)J(1+o(1))}P_e=\exp\{-(n^2/m) J (1+o(1))\} for some J>0J>0. (ii) A class of tests based on separable statistics, including the coincidence-based test, attains the optimal generalized error exponents. (iii) Pearson's chi-square test has a zero generalized error exponent and thus its probability of error is asymptotically larger than the optimal test.Comment: 43 pages, 4 figure

    Feature Extraction for Universal Hypothesis Testing via Rank-constrained Optimization

    Full text link
    This paper concerns the construction of tests for universal hypothesis testing problems, in which the alternate hypothesis is poorly modeled and the observation space is large. The mismatched universal test is a feature-based technique for this purpose. In prior work it is shown that its finite-observation performance can be much better than the (optimal) Hoeffding test, and good performance depends crucially on the choice of features. The contributions of this paper include: 1) We obtain bounds on the number of \epsilon distinguishable distributions in an exponential family. 2) This motivates a new framework for feature extraction, cast as a rank-constrained optimization problem. 3) We obtain a gradient-based algorithm to solve the rank-constrained optimization problem and prove its local convergence.Comment: 5 pages, 4 figures, submitted to ISIT 201

    Universal and Composite Hypothesis Testing via Mismatched Divergence

    Full text link
    For the universal hypothesis testing problem, where the goal is to decide between the known null hypothesis distribution and some other unknown distribution, Hoeffding proposed a universal test in the nineteen sixties. Hoeffding's universal test statistic can be written in terms of Kullback-Leibler (K-L) divergence between the empirical distribution of the observations and the null hypothesis distribution. In this paper a modification of Hoeffding's test is considered based on a relaxation of the K-L divergence test statistic, referred to as the mismatched divergence. The resulting mismatched test is shown to be a generalized likelihood-ratio test (GLRT) for the case where the alternate distribution lies in a parametric family of the distributions characterized by a finite dimensional parameter, i.e., it is a solution to the corresponding composite hypothesis testing problem. For certain choices of the alternate distribution, it is shown that both the Hoeffding test and the mismatched test have the same asymptotic performance in terms of error exponents. A consequence of this result is that the GLRT is optimal in differentiating a particular distribution from others in an exponential family. It is also shown that the mismatched test has a significant advantage over the Hoeffding test in terms of finite sample size performance. This advantage is due to the difference in the asymptotic variances of the two test statistics under the null hypothesis. In particular, the variance of the K-L divergence grows linearly with the alphabet size, making the test impractical for applications involving large alphabet distributions. The variance of the mismatched divergence on the other hand grows linearly with the dimension of the parameter space, and can hence be controlled through a prudent choice of the function class defining the mismatched divergence.Comment: Accepted to IEEE Transactions on Information Theory, July 201

    從成長小說看吳承恩《西遊記》

    Full text link
    在坊間眾多的《西遊記》版本中,本論文採用了中華書局所出版的百回本《西遊記》作為主要的參考資料。誠然,這部被列為四大名著之一的《西遊記》情節富豐、結構完整、人物生動活潑,頗能從神仙虛幻中反映出人性的真實、人生的哲理。比較先前的取經故事,確實是跨前了一大步,可說是揉合《大唐三藏取經詩話》、《大唐西域記》、《西遊記雜劇》等取經故事的情節,而又更進一步發展出 鮮明的人物個性。在百回本《西遊記》中的孫行者,就像被賦予強烈的人類感情,他的成長經歷極其親切,所寫的根本就人類而不是妖猴或聖佛。吳承恩成功地塑造的人物形象,傳至今世仍為人所津津樂道,被改拍成電影、電視劇,深得普羅大眾的歡迎,這正是其生動活潑而又貼近我們的最有力証明。 為了探求這不朽巨著的魔力,評論的文章如潮水般不斷地湧現。本論文捨去有關版本考証的方法,而採用西方的成長理論,試圖將悟空從出生,至花果山,再進入人間的種種經歷編排整理,然後作有機的分析,與大家分享其成長的各個轉變,對於喜歡孫悟空的朋友,這的確是相當有趣的研究課題。 本論文主要分成三大部份。第一章先交代有關的「成長小說」(Bildungsroman) 理論;第二章則深入探討孫悟空成長的四個階段,即嬰兒時期、少年時期、青年時期和成年時期,配合相關的心理學理論,剖析悟空成長的心路歷程;最後一章 則與西方名著坎伯(Joseph Campbell)《千面英雄》(The Hero With A Thousand Faces) 的「英雄歷程」相印証,更有組織地總結悟空的轉變

    Statistical SVMs for robust detection, supervised learning, and universal classification

    Get PDF
    The support vector machine (SVM) has emerged as one of the most popular approaches to classification and supervised learning. It is a flexible approach for solving the problems posed in these areas, but the approach is not easily adapted to noisy data in which absolute discrimination is not possible. We address this issue in this paper by returning to the statistical setting. The main contribution is the introduction of a statistical support vector machine (SSVM) that captures all of the desirable features of the SVM, along with desirable statistical features of the classical likelihood ratio test. In particular, we establish the following: (i) The SSVM can be designed so that it forms a continuous function of the data, yet also approximates the potentially discontinuous log likelihood ratio test. (ii) Extension to universal detection is developed, in which only one hypothesis is labeled (a semi-supervised learning problem). (iii) The SSVM generalizes the robust hypothesis testing problem based on a moment class. Motivation for the approach and analysis are each based on ideas from information theory. A detailed performance analysis is provided in the special case of i.i.d. observations. This research was partially supported by NSF under grant CCF 07-29031, by UTRC, Motorola, and by the DARPA ITMANET program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF, UTRC, Motorola, or DARPA. I

    Penalty Dynamic Programming Algorithm for Dim Targets Detection in Sensor Systems

    Get PDF
    In order to detect and track multiple maneuvering dim targets in sensor systems, an improved dynamic programming track-before-detect algorithm (DP-TBD) called penalty DP-TBD (PDP-TBD) is proposed. The performances of tracking techniques are used as a feedback to the detection part. The feedback is constructed by a penalty term in the merit function, and the penalty term is a function of the possible target state estimation, which can be obtained by the tracking methods. With this feedback, the algorithm combines traditional tracking techniques with DP-TBD and it can be applied to simultaneously detect and track maneuvering dim targets. Meanwhile, a reasonable constraint that a sensor measurement can originate from one target or clutter is proposed to minimize track separation. Thus, the algorithm can be used in the multi-target situation with unknown target numbers. The efficiency and advantages of PDP-TBD compared with two existing methods are demonstrated by several simulations

    Metatranscriptomics Reveals the Functions and Enzyme Profiles of the Microbial Community in Chinese Nong-Flavor Liquor Starter

    Get PDF
    Chinese liquor is one of the world's best-known distilled spirits and is the largest spirit category by sales. The unique and traditional solid-state fermentation technology used to produce Chinese liquor has been in continuous use for several thousand years. The diverse and dynamic microbial community in a liquor starter is the main contributor to liquor brewing. However, little is known about the ecological distribution and functional importance of these community members. In this study, metatranscriptomics was used to comprehensively explore the active microbial community members and key transcripts with significant functions in the liquor starter production process. Fungi were found to be the most abundant and active community members. A total of 932 carbohydrate-active enzymes, including highly expressed auxiliary activity family 9 and 10 proteins, were identified at 62°C under aerobic conditions. Some potential thermostable enzymes were identified at 50, 62, and 25°C (mature stage). Increased content and overexpressed key enzymes involved in glycolysis and starch, pyruvate and ethanol metabolism were detected at 50 and 62°C. The key enzymes of the citrate cycle were up-regulated at 62°C, and their abundant derivatives are crucial for flavor generation. Here, the metabolism and functional enzymes of the active microbial communities in NF liquor starter were studied, which could pave the way to initiate improvements in liquor quality and to discover microbes that produce novel enzymes or high-value added products
    corecore